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A device for real-time interest assessment 

Information technologies are becoming quite efficient at transmitting of data. People, however, are not 
interested in data per se. Rather, people want data that is useful for a particular task, and more 
specifically, people want interesting information The importance of giving interesting information in 
communication has been noted by various philosophers and scientists, including R Paul Grice, who 
argued that speakers must make their communication relevant to the listener if communication is to be 
successful. 

The problem of detemiining whether data are interesting to a receiver has been addressed in different 
ways within different media. In interpersonal communication, listeners provide speakers with veibal and 
non- verbal feedback that indicates the listener's level of interest. In many mass media, such as 
television, multiple channels that offer some variety information are provided, and the people receiving 
the information select from the available information whatever seems most interesting. People's 
selections are then measured, as by the Nelson ratings, so that more interesting and new (potentially 
interesting) content can be made more available and content that is not interesting can be made less 
available. 

The interpersonal means of interest level detection has an advantage over the typical mass media means. 
In the interpersonal medium, interest level detection occurs in real time, within a single exchange of 
information rather than between exchanges of information The speaker can introduce information, 
assess the listener's interest in the information, and then consider the listener's interests when presenting 
subsequent information Mass media technologies typically rely on less immediate feedback. One cost 
of this is that people have to search through information, looking for something interesting, only to 
discover that sometimes none of the available information is interesting. 

Our invention addresses this problem by assessing and communicating people's level of interest. This 
works as follows. Initially, whether a person is attending to the target information is assessed. If the 
person is not attending to the information, we assume that the person is not interested in the information 
at that time. Attention can be assessed in various ways depending on the particular medium. In visual 
media, for example, people reliably attend to the visual information to which their gaze is directed. 
Therefore, devices that determine at which target a person is looking, such as eye trackers, can be used 
for attention detection in the visual media. 

Next, a person's relative arousal level is assessed. If a person is more aroused when they attend to 
target information than they are when they are not attending to the target information, we assume that the 
person finds that information interesting at that time. Arousal in this case is a general affective state and 
can be assessed in various ways. For example, in interpersonal communication, speakers use facial 



expression as a means of assessing arousal and consequently interest. Therefore, devices that determine 
a persons arousal level, such as facial gesture detectors, can be used to assess arousal > 

By combining data about attention and arousal our device assesses the level of interest a person has in a 
particular information target This assessment can then be communicated as feedback about the 
information target 

One use of this device would be for an information presentation technology to receive interest level data 
about various information targets, and then present more information that is similar to the targets that 
were most interesting and present less information that is similar to the targets that were least interesting. 
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Please prepare a patent application with the subject disclosure material. This application is 
related to patent application AM9-98-031 'Ticker with Eye Tracking" 

If you have any questions, please contact Myron Flickner at 408-927-1776. You can send him 
mail at the above address, mail stop K57D/B2-250 

Thank you for your efforts on behalf of IBM and my department. 



Dear Sean: 
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RS:ljs 



Enclosure 



Law Offices Of 

mcginn & gibb, p.c. 

A Professional limited Liability Company 
Patents, Trademarks, Copyrights, and Intellectual Property LaW 
1701 Clarendon Boulevard, Suite 100 
Arlington, Virginia 22209 
Telephone: (703) 294-6699 
Facsimile/Data: (703) 294-6696; 294-6698 

E-MAIL: MCGINNGIBB @ AOL.COM 

Sean M. McGinn 
Frederick W. Gibb, in 

July 10, 1998 

VIA FACSIMILE AND AIR MAIL 

Ray Strimaitis, Esq. FAX No.: (408) 927-3375 

Division Counsel 

Intellectual Property Law, Dept. DNGA/J2B 
Almaden Research Center 
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Re: Estimate on Preparation of Patent Application 

"A SYSTEM FOR REAL-TIME DETERMINATION OF A 
USER'S LEVEL OF ... INFORMATION" 
IBM Docket No.: AM9-98-093 

Dear Ray: 
« 

Thank you for your letter dated June 22, 1998, in accordance with which I have 
reviewed the invention disclosure and I telephoned Myron Flickner and briefly discussed the 
invention with him. 

I estimate that the services fees for preparing a final draft application and formal papers 
suitable for filing to be about $4300 - $4600, absent some unforeseen and voluminous 
extension of the invention. As before, if the actual time spent is less than the estimate, the 
costs would be less. 

Please let me know only if this estimate is not acceptable. We will shortly begin 
preparing the first draft of the application. 

As always, thank you for entrusting this application to our firm. We deeply appreciate 
the opportunity to work again with you, your department, and Almaden's inventors. 

With best regards, 
Sean M. McGinn 
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A system for real-time determination of a users level of interest to 
presented information 

Chris Dryer, Myron Flickner 
Background of the Invention 

The present invention relates to a means of determining the level of interest a user expresses in 
content. In particular the invention shows a means of non intrusively detecting how interested a 
user is to presented information. In a typically scenario the content comes from broadcast or 
cable TV, the web, a computer application, a talk, a classroom lecture or a play. 

Description of Related Art 

Information technologies have become quite efficient at transmitting of data. People, however, 
are not interested in data per se. Rather, people want data that is useful for a particular task, and 
more specifically, people want interesting information. The importance of giving interesting 
information in communication has been noted by various philosophers and scientists, including 
H. Paul Grice [1] who argued that speakers must make their communication relevant to the 
listener if communication is to be successful. 

The problem of determining whether data are interesting to a receiver has been addressed in 
different ways within different media. In interpersonal communication, listeners provide 
speakers with verbal and non-verbal feedback that indicates the listener's level of interest. In 
many mass media, such as television, multiple channels that offer some variety information are 
provided, and the people receiving the information select from the available information 
whatever seems most interesting. People's selections are then measured, as by the Nelson 
ratings, so that more interesting and new (potentially interesting) content can be made more 
available and content that is not interesting can be made less available. 

The interpersonal means of interest level detection has an advantage over the typical mass media 
means. In the interpersonal medium, interest level detection occurs in real time, within a single 
exchange of information rather than between exchanges of information. The speaker can 
introduce information, assess the listener's interest in the information, and then consider the 
listener's interests when presenting subsequent information. Mass media technologies typically 
rely on less immediate feedback. One cost of this is that people have to search through 




information, looking for something interesting, only to discover that sometimes none of the 
available information is interesting. Our invention addresses this problem by assessing and 
communicating people's level of interest by passively observing them. 

Some related work can be found in the patent literature. In patent 5649061 Smyth describes a 
device for estimating a mental decision. This is done by monitoring a users gaze direction along 
with EEG and processing the output signals via a neural net to classify an event as a mental 
decision to select a visual cue. In other words the device can detect wfieri a user has decided to 
look at a visual target. The EEG is detected via skin sensors placed on the head. 

In patent 5507291 Stirbl et al. describes a method to remotely determine a persons emotional 
state. This is done by broadcasting a waveform of predetermined frequency and energy at an 
individual and detecting and analysing the emitted energy to determine physiological parameters. 
The physiological parameters, such as respiration, blood pressure, pulse rate; pupil size and. ' 
perspiration levels can be compared with reference values to provide information indicative of 
the person's emotional state, s \?-yi <■ --^ \r ; u ; i r ^ ; r i 

In patent 576261 1 , Lewis et al. describe a method of evaluating a subjects interest level in 
presentation materials by analysing brain generated event related potential (ERP) and/or event 
related field (ERF) waveforms. Random audio tones are presented to the subject followed by 
Ineasurement of ERP signals. The level of interest was computed from the magnitude of the 
difference of a baseline ERP signal and a ERP signal during a task in this case video 
presentations. This difference was correlated to die interest level users expressed by filling out a 
questionnaire about the video presentations. ERP measurement require scalp sensors and 
although the authors suggest using EMF signals would allow this to be done nbn-intrusiVely ho 
evidence that this is possible was given. 

In other work, Kamitani and Marutani [10] observed that perplexed behaviours of subject using 
a word processor resulted in head motion changes more than facial (expression changes. They 
used dynamic programming to match head motion with head motion templates of the following 
head gestures: nod, shake, tilt, a bend backwards, bend words, and no action. ' When the subject 
stopped typing and displayed appropriate head gestures they detected when the person was in a 
state of confusion In this case only perplexed behaviours not a general level of interest was " 
detected. . ' ' . . ^. \ ' \ ' ' " 

Roz Picard from the MIT media labs has done some experiments that have shown that people 
naturally lean forward when presented positive valance information [3]. ( I've asked Roz for a 
more detailed written reference). In this experiment a mouse with a trackpoint was used and the 
forward pressure on the trackpoint was measured then correlated with the valence level of 
presented information. 

Summary of the Invention 

Our invention improves on previous inventions by giving a non- intrusive way of detecting 
interest level whereas the prior art has required intrusive detection or detects only emotional 




information but not the level of interest. 

The first step in the process it to assess if a person is attending to the target information. If the 
person is not attending to the information, we assume that the person is not interested in the 
information at that time. Attention can be assessed in various ways depending on the particular 
medium. In visual media, for example, people reliably attend to the visual information to which 
their gaze is directed. Therefore, devices that determine at which target a person is looking, such 
as eye trackers, can be used for attention detection in the visual media. Furthermore, it has been 
shown that the duration of fixation time is a strong cue of indicated interest. People look at 
things longer when they are interested in them. 

Next, a person's relative arousal level is assessed. If a person is more aroused when they attend 
to target information than they are when they are not attending to the target information, we 
assume that the person finds that information interesting at that time. Arousal in this case is a 
general affective state and can be assessed in various ways. For example, in interpersonal 
communication, speakers use facial expression as a means of assessing arousal and consequently 
interest. Therefore, devices that determine a person's arousal level, such as facial and body 
gesture detectors, can be used to assess arousal. 

Finally, by combining data about attention and arousal our system assesses the level of interest a 
person has in a particular information target. This assessment can then be communicated as 
feedback about the information target. 

There is no generally agreed upon psychological definition of interest. We define interest as a 
state of high arousal and high attention. Subjects are less interested when they have low arousal - 
sleeping for example and low attention for example when eye closed there is no attention to 
visual media. Note that the valance of the arousal state is not a factor in the interest definition. 
You can be interested in something that suprises you as well as something that disgusts you or 
confuses you. In our case confusion/perplexity/frustration are just an internal states of interest. 

One use of this system would be for an information presentation technology to receive interest 
level data about various information targets, and then present more information that is similar to 
the targets that were most interesting and present less information that is similar to the targets 
that were least interesting. 

Detailed Description 

As described in the summary there are three steps need to implement the invention. First we 
need to determine what the user is attending. Second we need to determine the users arousal 
level. And finally we need to merge the attention information with the arousal level and output a 
level of interest. 

To determine what the user is attending we track the users gaze. There are many methods to 
track gaze. A good survey of various methods can be found in Young et al. "Methods and 
Designs: Survey of Eye Movement Recording Methods", Behaviour Research Methods & 
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Instrumentation, 1975 Vol. 7 pp 397-429. Since we want to observe gaze unobtrusively we 
prefer a remote camera based technique such as the corneal glint technique taught in patent 
4595990 Garwin et al. entitled "Eye Controlled Information Transfer" and further refined in 
patents 4536670 and 4950069 by Hutchinson. Commercially available systems such as the 
EyeTrac Series 4000 product by Applied Science Labs, the EyeGaze system by LC technology 
can be purchased to implement this aspect of the invention. One improvement on the 
commerical systems that allows for more head motion by using a hovel person detection scheme 
that uses optical properties of pupils. This is described in [18] and in the papers by Ebesawa [4] 
and patent 5016282 by Tomono also published in [11]. By finding the person using a wide field 
lens, the high resolution tracking camera can be targeted and avoid getting lost during large fast 
head and upper body motions. The output of the gaze tracker can be processed to give sets of 
fixations locations and durations. This can be done as described in [5] or buy purchasing 
commercial packages such as the EYE ANAL [26] package from Applied Science Labs. The 
fixation locations are mapped to applications/content on a screen/television monitor or object in a 
3-D environment. The durations are used to rank the fixation to signal the strength of the interest 
level. Longer fixation indicate stronger interest levels. In a room setting the gaze vector can be 
used along with a 3-D model of the room determines wtiat object the user is looking at. Since we 
now know what the users is looking at we know what the users is currently attending to as well 
the history of what the users has attended. We also know what the users has not yet seen and 
thus its interest level cannot be assessed. 

The next step is to determine the users relative arousal level. Here we use the technique of 
analysing facial gestures from video sequences. Ekman [2] created a system of coding facial 
expressions that has been used to characterize human emotions. Using this system human 
emotions such as fear, surprise, anger, happiness, sadness and disgust can be extracted by 
analysing facial expressions. Computer vision researchers have recently codified the 
computation of these features [19-24]. In addition by observing head gestures such as 
approval/disapproval nods, yawns, blink rate/duration, and pupil size and audio utterances we 
get a measure of the arousal level of the user at the current time. This type of detection has been 
used to detect the onslaught of sleep of drivers in cars [7,8] and U.S. patent 5786765 by S. 
Kumakura et al, "Apparatus for estimating the drowsiness level of a vehicle driver". Whereas 
multiple approval nods are a strong indication that the users is alert and interested. In this 
implementation we don't integrate speech but we wish to point out that it can be used to help 
decide the persons affective state. Expressions like yeah, right indicate strong interest whereas 
expression like "bleah", "yuck" indicate strong disinterest. 

Blink rate can be measured by simply analysis of the output of the pupil detection scheme [18]. 
Whenever both pupils disappear a blink is marked and the duration is measured. The blink rate 
is computed by simply counting the last few blinks over a period of time and dividing by the 
time. A decreasing blink rate and increasing blink duration is a strong indicator that the users is 
falling asleep and thus low arousal level. 

Upper body motion can be detected by analysing the motion track of the pupil over time. To 
extract this information we use the technique taught by [10], We compute x, y, z and tilt angle of 
the head by simple analysis of the pupils centers. The motion in x and y is computed using a 



finite difference of the left and right pupil center averages. A motions in z can be obtained using 
finite differences on the measured distance between the pupils. The tilt angle motion can be 
computed using finite differences on the angle between the line connecting the pupils and a 
horizontal line. Then a distance between the gesture is computing using dynamic programming 
to the following templates: yes nod, no nod, lean forward, lean backward, tilt and no action. The 
output of this stage are 6 distances to the 6 gestures. These distance is computed over a the 
previous 2 seconds worth of data and updated each frame. 

To extract information from facial gestures we look at the eyebrow and mouth region of the 
person's face. The pupil finding technique tells us where the pupils of a person are. From this 
information and a simple face model we extract regions of the eyebrows and the region of the 
lips. 

To identify the eyebrows two rectangular regions are extracted using the line connecting the two 
pupils as shown in Figure 1 . Aligning the rectangles to the line connecting the pupils allows for 
side to side head rotation ( I don't know gesture) and establishes an invariant coordinate system. 
The regions are thresholded to segment the eyebrows from the underlying skin. The coordinates 
of the inside ( medial) and outside (temporal) point of the largest blob are found and the 
perpendicular distance between these points and the baseline are computed. To allow for 
invariance to up and down rotation ( yes gesture movement ) the ration of the distances are 
computed. The muscles of the face only act on the medial point the temporal point remains fixed 
on the head but the distance will change due to perspective from up/down head rotation. The 
ratio of the distances reflects changes due to the medial point from face muscles and not head 
motion. 

To identify the mouth we find the mouth again by using the coordinate system aligned to the 
lines between the pupils. Here we seek to find the corner of the mouth. This is done by 
searching for corners using a corner detection scheme. Here we use the eigenvalues of the 
windowed second moment matrix as outlined pages 334-338 of [17]. Then the perpendicular 
distance between the 

mouth comer and the baseline between the pupils is computed. 

To summarise, the features we have extracted are as follows: what the user is looking at, his 
blink rate, six distances to six head gestures, the relative position of his eyebrows, and the 
relative position of the corners of his mouth. The next step is to merge this information into a 
measure of interest level. The is accomplished a neural net with the 1 1 inputs ( blink rate, 
gesture distances, eyebrow distances, and mouth distances) 20 hidden units and 3 outputs. The 
outputs correspond to interested, uninterested and neutral. 

Operation 
Claims 



What is claimed is: 
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1 . A means of non-intrusive detection of a users level of interest in presented information, 
compromising: 

a means of detecting to what a user is attending ; 
a means of measuring a users relative arousal level 

a means of combining arousal level and attention to produce a level of interest 



2. A means of non-intrusive detecting the object of a users interest in presented information 
comprising: 

a means of detecting the object the user is attending ; 
a means of measuring the users relative arousal level 

a means of combining arousal level and attention to produce the object of interest 
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Please mark up the application and drawings as appropriate and confirm my labeling on 
the drawings. Please fax me the markedly first draft at the facsimile number above. 

In the application, I have left "holes* for you and the co-inventors to describe features 
of the invention which require further explanation/ description. Additionally, I would 
welcome any further features and embodiments which you or the other co-inventors may wish 
to offer. 

hi tefh Background section, I propose simply identifying the patented conventional 
systems as singly conventional. This will avoid any admissions or estoppels which could be 
held against us later. We can submit the patents to the Examiner at the time of filing to meet 
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Mr. Myron Flickner 
Page 2 

November 13, 1998 

our duty of disclosure. 

Lastly, I have included some preliminary claims for discussion. Please revjew thflll fift 
the desired scop e and accuracy m to mv nresftnt understanding of the invention. As shown, I 
have included independent claims directed to a system and to a method, as well as some 
dependent claims which further limit die independent claims. Please let me know if anv other 
features should he claimed. I will add more claims in the second draft including some signal 
mffd" im claims- 

I look forward to receiving your comments and instructions. I will forward a second 
draft to you as soon as I receive your comments. 

Thanks again for all of your help. 

With best regards, 

Very truly yours, 
Sean M. McGinn 
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SEAN M. MCGINN 
FREDERICK W. GIBB, III 



Mr. Myron Flickner 

IBM Almaden Research Center 

Mail Stop K57-B2250 

650 Hany Road 

San Jose, C A ?5 120-6099 



VIA EXPREQS MAIL 



Re: Second Draft Application 

"METHOD AND SYSTEM FOR REAL-TIME DETERMINATION OF 
A USER'S LEVEL OF INTEREST TO PRESENTED INFORMATION" 
IBM Docket No: AM9-98-093 
OurRef: ALM.008 

Dear Myron: 

Thank you for your comments received on December 14, 1998. Enclosed is a copy of the 
second draft patent application in the above docket including the informal drawings. Please have 
all of the inventors closely review the Application and fax back the marked-up second draft to 
one of the above listed numbers Please note the Application still needs publication dates of 
some of the references . 

I believe the Application is close to being in final form. I reordered the drawings to be 
more logical and I also added a number of new claims for more complete coverage. Shortly, 
after we file the case, we will need an orig in al of the photograph of Figure 2 . 

Please let me know of any other prior art (patents, publications, etc.) which we should 
bring to the attention of the U.S.P.T.O. This ultimately will provide a much stronger 
subsequently-issued patent. 

I look forward to receiving your comments and instructions. 



Very truly yours, 




Sean M. McGinn 



SMM/ap 
Enclosures 

cc: Ray Strimaitis, Esq. 

Division Counsel, Almaden Research Center . ^ 
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SEAN M. MCGINN 
FREDERICK W. GIBB, III 



January 22, 1999 



Mr. Myron Flickner 

IBM Almaden Research Center 

Mail Stop K57-B2250 

650 Harry Road 

San Jose, CA 95120-6099 



Dear Myron: 

Enclosed are three copies of the final draft applications in the above dockets. Please note 
that in the -031 docket we changed the first page of the application in the AM9-98-031 docket 
since the title in the -093 docket was changed. This was the only change in the -03 1 docket 
previously sent to you. 

After signing the papers in both cases, please overnight or Express Mail all the papers 
(application, drawings. Declaration, and Assignment) to me . Also, please provide me with any 
other relevant prior art (patents, publications, etc.) which you believe an Examiner may find 
relevant to his decision to grant a patent for your invention. I thank you for the references 
already received. 

Specifically, in order to satisfy the strictly enforced duty of disclosure under U.S. patent 
law, please promptly advise us of any prior art information which is now known or which may 
become known to those involved in the preparation or prosecution of this application, and which 
the Examiner may deem relevant to patentability of the claims. Such information should include 
any commonly assigned patents and pending applications disclosing and/or claiming closely 
related subject matter. 



Re: Final Draft Applications 

IBM Docket No: AM9-98-093 and AM9-98-031 
Our Ref: ALM.008 and ALM.002 



Very truly yours, 




Sean M. McGinn 



SMM/ap 



Enclosures 



cc: Ray Strimaitis, Esq. 
Division Counsel 



